A Faster Algorithm for Approximate String MatchingExtended
نویسندگان
چکیده
We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic nite automaton built from the pattern and using the text as input. This simulation uses bit operations on a RAM machine with word length O(log n), being n the maximum size of the text. The running time achieved is O(n) for small patterns (i.e. m = O(p log n)), independently of the maximum number of errors allowed, k. This algorithm is then used to design two general algorithms. One of them partitions the problem into subproblems, while the other partitions the automaton into subautomata. These algorithms are combined to obtain a hybrid algorithm which on average is O(n) for moderate k=m ratios, O(p mk= log n n) for medium ratios, and O((m ? k)kn= log n) for large ratios. We show experimentally that this hybrid algorithm is faster than previous ones for moderate size patterns, which is the case in text searching.
منابع مشابه
Approximate String Matching with Reduced Alphabet
We present a method to speed up approximate string matching by mapping the factual alphabet to a smaller alphabet. We apply the alphabet reduction scheme to a tuned version of the approximate Boyer– Moore algorithm utilizing the Four-Russians technique. Our experiments show that the alphabet reduction makes the algorithm faster. Especially in the k-mismatch case, the new variation is faster tha...
متن کاملFaster Filters for Approximate String Matching
We introduce a new filtering method for approximate string matching called the suffix filter. It has some similarity with well-known filtration algorithms, which we call factor filters, and which are among the best practical algorithms for approximate string matching using a text index. Suffix filters are stronger, i.e., produce fewer false matches than factor filters. We demonstrate experiment...
متن کاملA Fast Algorithm for Approximate String Matching on Gene Sequences
Approximate string matching is a fundamental and challenging problem in computer science, for which a fast algorithm is highly demanded in many applications including text processing and DNA sequence analysis. In this paper, we present a fast algorithm for approximate string matching, called FAAST. It aims at solving a popular variant of the approximate string matching problem, the k-mismatch p...
متن کاملLEAP: A Generalization of the Landau-Vishkin Algorithm with Custom Gap Penalties
Motivation: Approximate String Matching is a pivotal problem in the field of computer science. It serves as an integral component for many string algorithms, most notably, DNA read mapping and alignment. The improved LV algorithm proposes an improved dynamic programming strategy over the banded SmithWaterman algorithm but suffers from support of a limited selection of scoring schemes. In this p...
متن کاملFaster Generation of Super Condensed Neighbourhoods Using Finite Automata
We present a new algorithm for generating super condensed neighbourhoods. Super condensed neighbourhoods have recently been presented as the minimal set of words that represent a pattern neighbourhood. These sets play an important role in the generation phase of hybrid algorithms for indexed approximate string matching. An existing algorithm for this purpose is based on a dynamic programming ap...
متن کامل